Code Optimization of Polynomial Approximation Functions on Clustered Instruction-level Parallelism Processors
نویسندگان
چکیده
In this paper, we propose a general code optimization method for implementing polynomial approximation functions on clustered instruction-level parallelism (ILP) processors. In the proposed method, we first introduce the parallel algorithm with minimized data dependency. We then schedule and map the data dependency graph (DDG) constructed based on the parallel algorithm to appropriate clusters and functional units of a specific clustered ILP processor using the proposed parallel scheduling and mapping (PSAM) algorithm. The PSAM algorithm prioritizes those nodes on the critical path to minimize the total schedule length and ensures that the resulted schedule satisfies the resource constraints imposed by a specific cluster ILP processor. As a result, our method produces the schedule lengths close to the lower bounds determined by the critical path lengths of the DDGs. Experimental results of typical polynomial mathematical functions on TI ’C67x DSP show that the proposed method achieves significant performance improvement over the traditional computation method.
منابع مشابه
Using Proole Information to Assist Advanced Compiler Optimization and Scheduling
Compilers for superscalar and VLIW processors must expose suucient instruction-level parallelism in order to achieve high performance. Compile-time code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism alon...
متن کاملIncreasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures - Microarchitecture, 1996., IEEE/ACM International Symposium on
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers offunctional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, calle...
متن کاملUsing Profile Information to Assist Advaced Compiler Optimization and Scheduling
Compilers for superscalar and VLIW processors must expose su cient instruction-level parallelism in order to achieve high performance. Compiletime code transformations which expose instruction-level parallelism typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instructionlevel parallelism along ...
متن کاملDe-pipeline a software-pipelined loop
1 Dept. of Computer Science, The William Paterson University of New Jersey, Wayne, NJ 07470, USA 2 Wireless Speech and Data Processing, Nortel Networks, Montreal, QC, Canada, H3E 1H6 Abstract Software pipelining is a loop optimization technique that has been widely implemented in modern optimizing compilers. In order to fully utilize the instruction level parallelism of the recent VLIW DSP proc...
متن کاملStream Execution on Embedded Wide-Issue Clustered VLIW Architectures
Very long instruction word(VLIW-) based processors have become widely adopted as a basic building block in modern Systemon-Chip designs. Advances in clustered VLIW architectures have extended the scalability of the VLIW architecture paradigm to a large number of functional units and very-wide-issue widths. A central challenge with wide-issue clustered VLIW architecture is the availability of pr...
متن کامل